56 research outputs found

    CHIPS: Custom Hardware Instruction Processor Synthesis

    Full text link

    High performance Boson Sampling simulation via data-flow engines

    Full text link
    In this work, we generalize the Balasubramanian-Bax-Franklin-Glynn (BB/FG) permanent formula to account for row multiplicities during the permanent evaluation and reduce the complexity of permanent evaluation in scenarios where such multiplicities occur. This is achieved by incorporating n-ary Gray code ordering of the addends during the evaluation. We implemented the designed algorithm on FPGA-based data-flow engines and utilized the developed accessory to speed up boson sampling simulations up to 4040 photons, by drawing samples from a 6060 mode interferometer at an averaged rate of 80\sim80 seconds per sample utilizing 44 FPGA chips. We also show that the performance of our BS simulator is in line with the theoretical estimation of Clifford \& Clifford \cite{clifford2020faster} providing a way to define a single parameter to characterize the performance of the BS simulator in a portable way. The developed design can be used to simulate both ideal and lossy boson sampling experiments.Comment: 25 page

    A Selection of Recent Advances in Computer Systems

    No full text
    This paper presents a selection of recent advances in computer systems. The roadmap for CMOS technology for the next ten years shows a theoretical limit of 0.1 m for the channel of a MOSFET transistor, reached by 2007. Mainstream processors are adapting to multimedia applications with subword parallel instructions like Intel's MMX or HP's MAX instruction set extensions. Coprocessors and embedded processors are moving towards VLIW in order to save hardware costs. The memory system of the future is going to be the next generation of Rambus/RDRAM. Finally, Custom Computing Machines based on Field Programmable Gate Arrays are one of the promising future technologies for computing -- offering very high performance for highly parallelizable and pipelinable applications

    Dataflow Computing for Data-Intensive Applications

    No full text

    Dynamic Circuit Generation for Boolean Satisfiability in an Object-Oriented Design Environment

    No full text
    We apply our object-oriented design environment PAM-Blox to dynamic generation of circuits for reconfigurable computing. Our approach combines the structural hardware design environment with commercial synthesis of finite state machines (FSMs). The PAM-Blox environment features a well defined hardware object interface and the ability to control the placement of hand-optimized circuits. We integrate the advantages of an object-oriented design environment with full control over placement atevery level of abstraction, with commercial FSM synthesis and optimization. As driving application we consider reconfigurable hardware accelerators for the NP-complete Boolean satisfiability problem. These accelerators require a fast compilation of circuits consisting of instance-specific datapaths and control automatons. By providing FSM optimization and control over placement, our design environment enables the maximization of performance

    Parallel, Pipelined CORDICs for Reconfigurable Computing

    No full text
    Reconfigurable computing has shown impressive successes with data intensive and latency tolerant applications. Pipelined and parallel implementations of CORDICs can achieve very high throughput for rotation, and various other functions such as multiplication, division, as well as hyperbolic and other higher order functions. Reconfiguration allows us to adapt the implementation of CORDICs and related architectures to the specific needs and properties of individual applications or specific sets of applications; hence creating application specific CORDIC implementations. Therefore it is becoming evident that CORDICs are very well suited to reconfigurable computing and custom computing machines

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-specific number representations. Well-known number formats include fixed-point, floating-point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc-ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presentsa platform that enables automated exploration of the number representation design space. Thesecond part of the thesis shows case studies that optimise the designs for area, latency orthroughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: • Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which supporta wide range of bit widths and achieve significant improvement over previous designs. • Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations.EThOS - Electronic Theses Online ServiceOverseas Research Students Award Scheme and UK Engineering and Physical Sciences Research CouncilGBUnited Kingdo

    Application of reconfigurable CORDIC architectures

    No full text
    Very high performance architectures can be designed for data intensive and latency tolerant applications by maximizing the parallelism and pipelining at the algorithm and bit level. This is achieved by combining such technologies as reconfigurable or adaptive computing and CORDIC style arithmetic, for computing (possibly hyperbolic) rotations, multiply, divide, and related higher order functions (e.g. square-root, multidimensional rotations). Reconfiguration allows adapting the implementation of such functions to the specific needs of individual or specific sets of applications, from multi-media to radar and sonar, hence creating application specific CORDIC-style implementations. We show a high-throughput CORDIC for reconfigurable computing, a low latency CORDIC, and discuss an application to adaptive filtering (normalized ladder algorithm). 1
    corecore